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Abstract. —We have sequenced the complete plastid genome of the fern Angiopteris evecta. 'This 
taxon belongs to a major lineage (marattioid ferns) that, in most recent phylogenetic analyses, 
emerges near the base of the monilophytes. We used fluorescence activated cell sorting (FACS) to 
isolate organelles, rolling circle amplification (RCA) to amplify the plastid genome, followed by 
shotgun sequencing to 8X depth coverage, and then we assembled these reads to obtain the plastid 
genome sequence. The circular genome map has 153,901 bp, containing inverted repeats of 
21,053 bp each, a large single-copy region of 89,709 bp, and a small single-copy region of 
22,086 bp. Gene order is similar to that of Psilotum. Several unique characters are observed in the 
Angiopteris plastid genome, such as repeat structure in a pseudogene. We make structural 
comparisons to Psilotum and Adiantum plastid genomes. I lowever, the overall structural similarity 
to Psilotum indicates either wholesale conservation of genome organization, or (less likely) 
repeated convergence to a stable structure. The results are discussed in relation to a growing 
comparative database of genomic and morphological characters across the green plants. 


Vascular plants first appear in the fossil record during the Silurian (Kenrick 
and Crane, 1997; Pryor ei cil., 2004a; Stewart and Rothwell, 1993). Although 
many major lineages are extinct, recent phylogenetic studies (Pryer et al., 2001) 
indicate that an early split resulted in two extant lineages: seed plants and 
monilophytes. The latter includes the leptosporangiate ferns, marattioid ferns, 
horsetails, and a clade that includes eusporangiate ferns and whisk ferns. How 
these four lineages are related to each other is still poorly understood (Pryer e/ 

and Pryer, 2005). Resolving these 


a/.. 


Wilkstrom 


phylogenetic nodes is important for understanding the evolution oi morpho¬ 
logical. genetic, and developmental systems in monilophytes. As part ol an 
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effort to provide data for addressing this issue, we sequenced the complete 

genome of Angiopteris evecta (Marattiaceae). Currently, complete 
plastid genome sequences are available from only one leptosporangiate fern, 
Adiantum capiUus-veneris L. (Wolf et al., 2003), and from only one other 
monilophyte, Psilotum nudum (L.) P. Beauv., whereas aboul 50 seed plant 
plastid genomes are currently in CenBank. Complete genome sequences can 
provide information on many levels, including genome structure, gene 
content, intron content, and nucleotide sequences of targeted regions. We 
chose Angiopteris evecta for our study because it is an easily available 
representative of an ancient lineage for which no plastid genome has been 
sequenced. Extant marattioid ferns include about 240 species (see Pryer el al., 
2004b) typically treated in four genera and one family (Smith et al., 2006). 
Marattioid terns first appeared in the middle carboniferous, and 
assignable to the extant genus Marattia date to the late Triassic (Hill and 
Camus. 1986). Thus marattioid ferns represent a clade as significant as seed 
plants or leptosporangiate ferns in terms of age, though not in terms of extant 




Although the plastid genome is generally conserved in overall structure 
among land plants (Palmer, 1985), there is often sufficient variation for 
comparative analysis both at the structural and sequence levels. Large 
rearrangements, spanning several genes, are likely to be rare events that can 
be used a phylogenetic markers (Raubeson and Jansen, 1992). Early studies of 

V' 

fern chloroplast genomes uncovered a wealth of phylogenetic data and insights 
into the evolution of the genome (Hasebe and Iwatsuki, 1992; Raubeson and 
Stein, 1995; Stein et al., 1992; Stein et al., 1989). One significant finding from 
these studies was that a large portion of the plastid genome has been 

series of events has not vet been fully 
diaracterized. Subsequently, there was a shift to more focused studies on HNA 
sequences of a few genes from large numbers of taxa (Hasebe et al., 1994; 
Hasebe et al., 1995; Pryer el al., 2004b). Thus, our understanding of structural 
evolution of fern plastid genomes remains limited. This study represents part 
of a broader investigation into plastid genome evolution by sequencing 
complete genomes or large portions thereof. Because Angiopteris represents 
a major lineage, details of its plastid genome can provide baseline data for this 
and other studies. Our objective here is to present the plastid genome sequence 
of Angiopteris evecta and compare it structurally to other monilophvtes. 


rearranged in ferns, but the exact 

c 1 


Materials and Methods 


Preparation and DNA sequencing. —Pinnules from an immature crozier ol A. 

evecta were collected from a plant growing at the University of Washington, 
Seattle, WA, 




source unknown). Voucher specimens (UC 
1794629, 1794630, and 1794631) are deposited at the University of California 

of intact 




Herbarium at Berkeley (UC). We collected pur 
chloroplasts from A. evecta by fluorescent activated cell sorting (FACS). One 
hundred milligrams of fresh frond tissue was sliced into 0.25-1 mm segments 
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in a sterile plastic Petri dish (on ice) in 1.0 mL of an organelle isolation 
solution containing 0.33 M sorbitol, 50 mM HEPES at pH 7.6, 2 mM EDTA, 
1 mM MgCl 2 , 0.1% BSA, 1% PVP-40, 1.5 M NaCl and 5 mM (5-Mercaptoetha- 
nol, adjusted to pH 7.6 with KOH. Suspended organelles (chloroplasts, 
mitochondria, and nuclei) were withdrawn using a wide-bore pipette then 
filtered through 30 pm nylon mesh. Organelles were then stained with DAP1 
(Sigma-Aldrich, St. Louis, MO, USA) and Mitotracker Green (Molecular Probes 
Inc., Eugene, OR, USA) at final concentrations of 2 pg/mL and 100 nM, 
respectively. The organelle suspension was incubated on ice for 15 min then 
analyzed on a FACS DiVa using sterile phosphate buffered solution (Invitrogen 
Inc., Carlsbad, CA, USA) as sheath fluid. We used a Coherent INNOVA 
Enterprise Ion laser (Coherent, Inc., Santa Paula, CA, USA) emitting a 488 nm 
beam at 275 mW to excite chlorophyll and Mitotracker Green, and a LJV beam 
at 30 mW to excite DAPI. Red fluorescence from chlorophyll was passed 
through a 675±20 nm filter, held within the FL3 photomultiplier tube (PMT). 
Green fluorescence from Mitotracker Green was passed through a 530±30 nm 
filter held within the FLl PMT. DAPI fluorescence from DNA was passed 
through a 424±44 nm filter held within the FL4 PMT. Organelles were 
collected into separate sterile 15 ml centrifuge tubes by flow cytometric sorting 
based on the respective sorting gates. Sorted organelles were pelleted at 3000 g 
for 15 min, flash frozen in liquid nitrogen, and shipped frozen for DNA 
isolation and amplification. 

'The DNA preparation was processed for sequencing by the Production 
Genomics Facility of the DOE Joint Genome Institute (JGI). Template was first 
amplified via rolling circle amplification (RGA) with random hexamers (Dean 
et al ., 2001). The RGA product was mechanically sheared into random 
fragments of about 3 kb by repeated passage through a Hydroshear device 
(Genemachines, San Garlos, CA, USA). These fragments were then enzymat¬ 
ically repaired to ensure blunt ends, then purified by gel electrophoresis to 
select for a narrow distribution of fragment sizes. Fragments were ligated into 
dephosphorylated pUCl8 vector and transformed into E. coli to create plasmid 
libraries, using standard techniques (Sambrook et al., 1989). Automated colony 
pickers were used to select colonies into 384-well plates containing LB 
medium. After overnight incubation, a small aliquot was processed robotically 
by RCA of plasmids (Dean et al., 2001), then used as a template for DNA 
sequencing using Big-Dye chemistry (Applied Biosystems, Foster City, CA, 
USA). Sequencing reactions were cleaned using SPRI (Elkin et al., 2001) and 
separated electrophoretically on ABT 3730XL or Megabace 4000 automated 
DNA sequencing machines to produce a sequencing read from each end of 
each plasmid. 

Assembly and annotation .—Sequences were processed using Phred (Ewing 
and Green, 1998; Ewing et al., 1998), trimmed for quality, screened for vector 
sequences, and assembled using Phrap. Quality scores were assigned 
automatically, and the electropherograms and assembly were viewed and 
verified for accuracy using Consed 12 (Gordon et al., 1998). As is typical, 
manual input was required to reconstruct part of one of the inverted repeat 
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regions, since automated assembly methods cannot recognize these as 
different. Regions of low quality or coverage and several gaps were reamplified 
by PCR and then sequenced. We designed primers from the ends of the longest 
contigs and used Rroofstart long-PCR (QIAGEN, Valencia, GA, USA) to amplify 
the missing regions. Reagent concentrations and amplification conditions 
followed the manufacturers instructions and we used PGR extension times of 
I min./kb of estimated I’GR product. PGR products were digested with 7sp409I 



overhang with 



and Sandal (compatible overhang with 


Bam HI). The fragments were separated in agarose, visualized, and cut from the 
gels. These fragments were then cloned into puGl9, end-sequenced, and added 
to the previous assembly. If assembly of a gap was incomplete at this stage, 
then primers were designed from the subclone fragment sequences above and 
used to sequence the appropriate region using the earlier long-PCR product as 
a template. In this way we closed all 12 gaps. The final assembly has an 
average depth of coverage of 8X. We assembled the sequence as a circular 


genome with two copies of the inverted 


W< 


the genome 


using I H )GMA (Dual Organellar GenoMe Annotator) (Wyman at al ., 2004). 
Genes were located by using a database of previously published chloroplasl 
genomes, from which Blast searches (Altschul at al., 1997) are used to find 
approximate gene positions. From this initial annotation, we located 

starts, stops, and intron positions based on comparisons to 
homologous genes in other chloroplast genomes. We also took into account the 
possibility of RNA editing, which can modify tlu; start and stop positions 



Wt- 


Wf 


repeat size to 20 and analyzed the sequence with only one copy of the inverted 


repeat. 


Ri lsults and Discussion 


The plastid genome ol Angioptaris avacta has 153,901 bp. with inverted 
repeats (IRa and 1 Rh) ol 21,053 bp each, a large single-copy (LSG) region of 


89,709 bp, and a small single-copy (SSC) region of 22,086 bp (Fig. 1). The 
sequence and annotation is deposited in GenBank as accession num¬ 
ber DQ821119. During annotation ol the genome, we located the repertoire 
of genes that is typical ol land plant plastid genomes. The overall organization 
of I he Angiopt aris plastid genome is typical of other vascular plants and most 
similar to that of Psilotum nudum among plastid genomes sequenced to date. 
Some of the differences 



een Angioptaris and Psilotum are possibly 
a function ol autapomorphies in either lineage, but this cannot be determined 
until more plastid genomes are examined. For example, Psilotum lacks three 
genes ( chlL , chIN, and chIB), for subunits of prolochlorophyllide, an enzyme 
involved in the light-independent formation of chlorophyll. These three genes 
are found in most other plastid genomes, including Angioptaris. The ends of 
the 1R also vary considerably among vascular plants. Psilotum differs from 
Angioptaris in that the SSG-IR boundary in the former is near trnL-UAG and 
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Fig. l. Circular genome map oi the plastid genome ot Angiopteris evecta. Cones appearing on the 
outside of the circle are transcribed clockwise; genes on the inside are transcribed counterclock¬ 
wise. T denotes putative pseudogene. A 10 kb region is marked on the right to show the scale. 

the SSC extends from ccsA to ycj 7, whereas in Angiopteris the SSC is longer 
and extends from ndhF to chIL (Fig. 2). Gene order at the LSC-IR boundary of 
Angiopteris is very similar to that of Psilotum, differing only in the sizes of 
intergenic regions rather than gene positions (Fig. 2). The overall gene order 
within the IR is similar to that of seed plants and Psilotum , consistent with the 
hypothesis that this region sustained several rearrangements at some time 
during the diversification of leptosporangiate ferns (Hasebe and Iwatsuki, 
1992; Stein et al, 1992). An inversion of about 3Kb, involving psbD, psbC, and 
psbZ , was previously detected in Psilotum and Adiantum relative to other land 
plants, and more recently documented in the plastid genome of Equiselum (K. 
Karol, personal communication). This inversion is also seen in Angiopteris, 
thus providing a potential phylogenetic marker for the monilophyte clade. 
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However ihe gene order 



Another region ol interest is in the LSC between rpoB and psbZ. This region 
has the same gene order in Psilotum and Angiopturis so it is likely to lie an 
ancestral monilophyte organization. The Adiantum gene order differs from 
that of Angipoteris and Psilotum in this region. 

tronee cannot be explained by a single inversion. Instead, at least two 
overlapping inversions are required to explain the variation. Fig. 4 presents 
two alternative most-parsimonious pathways Irom a putative ancestral 
monilophyte gene order to that ol Adiuntum. Analysis of this region from 

several clades of leplosporangiate clades may help determine which sequence 
of events occurred. 

One gene that we have not annotated is that for the hypothetical protein 
ycl'68. Although found in several land plant plastid genomes, this gene is 
usually not annotated, perhaps because it is a relatively short reading frame 
(approximately BOO bp) and its function is unknown. In 
located in the IK at positions 104265-104639 and 139346-138972. However, 
there are at least three frameshifts, suggesting that ycl'68 is a pseudogene. 

1 he Angiopturis plastid genome contains several regions with repeat 
structure. Results Irom the analysis by REPuter revealed two main regions of 
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Fig. 3. Diagram of the LSC-IR boundaries, 
Angiopteris evectci. 


comparing that of Psilotum nudum with 


long repeats (more than 20 bp). We found an 817 bp direct repeat within the 
region annotated as the pseudogene for the hypothetical protein ycft in the 


SSC, as well as a 352 bp string with a 95% similarity to the reverse 
complement also in the same region. Either several duplications or inversions 


resulted in ycft becoming a pseudogene, or its loss of function lifted selective 
constraints against such structural rearrangements. The remaining repeai 
regions were all at the beginning of the IR between trnl and trnL. This region is 
highly variable in several plastid genomes, probably due to the creation of 
partial genes during expansion and contraction of the IR (Goulding et al., 1996; 


Palmer, 1991). 

We found no stop codons within otherwise open reading trames and no 
other obvious indications that RNA editing would be required. The ycft 
pseudogene was too drastically different from heterologous ycft sequences to 
explain the differences by RNA editing. However, absence of evidence is not 
evidence of absence; RNA editing can only be tested by sequencing cDNAs. 
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c j n 


I his has only been clone systematically for two chloroplast genes [ndhB and 
rhcL) in all major lineages ol land plants (Freyer et ol., 1997). The complete set 
ol transcripts from chloroplast genes has only been examined in one liverwort, 
Anthoceros , (Kugita et ah, 2003) and one leptosporangiate fern, Adiantum 
capillus-veneris (Wolf el al. , 2004), for which 350 RNA edited sites 
detected. Thus, it remains unclear whether 
derived or ancestral within momiopiiytes. 


were 



levels of RNA editing are 


Why have the 


plastic! genome structures of Angiopteris and Psilotum 


remained set constant over such a long period ol evolutionary time? The most- 

recent common ancestor ol Angiopteris and Psilotum probably lived over 
400 million year: 


s ago (Fryer el id.. 



. Plastic! genome structure has 
evolved rapidly in several younger clades such as Geranium (Palmer el al.. 

1987a) and Campanulaceae (Cosner et al., 2004). Some events have been 

correlated with loss of structural stability, such as loss of the inverted repeat 

(Palmer el al., 1987b). Clearly, plastic! genome structure does not evolve in 

a clock-like manner. In tact, it is tor this reason that structural changes can 

o 
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provide useful phylogenetic markers. Gene order can take on many possible 
states whereas DNA sequences have only lour states. Thus, structural changes 
are more complex than nucleotide substitutions: reversion to an ancestral gene 
order is unlikely compared to reversion to an ancestral base in the DNA 
sequence. Long evolutionary branches have, on average, more opportunity to 
accumulate changes. However, the non-clock-like nature ol structural changes 
provides a chance for them to become phylogenetic markers on short branches 
where signal is weak in DNA sequence data. The conservation of plastid 
genome structure between Angiopteris and Psilotum is either a function of long 
term stability in both lineages, or independent (and perhaps repeated) 
convergence to a more stable structure and gene order. Distinguishing these 
hypotheses can only be achieved with more genome structural data from 
additional clades, and such information is needed if we are to understand 
more about the levels of homoplasy for structural genomic characters. 

If the plastid genome structure of Angiopteris has indeed remained constant 
since the origin of monilophytes, this woidd correlate with other evolutionary 


Marattiaceae 


Maratt 


and in the tree fern clade (Soltis et al., 2002). If genome structure has also been 


Marattiaceae 


rates for morphology, DNA sequences, and genome structure. Testing for such 
a correlation would require more data on genome structure, including nuclear 
genomes, from more taxa. 

Angiopteris represents an ancient lineage whose affinities to other 


Most 


\ 1 a i 


(IVilkstrom and Pryer, 2005). If so, this would be an ancient clade, with little 
signal remaining. Data are forthcoming on the plastid genome of Equisetum, in 
addition to the mitochondrial genomes of Angiopteris and Equisetum (K. 
Karol, personal communication), and it is hoped that additional phylogenetic 
information will soon be provided. 

The circular diagram depicted in Fig. 1 is, like all such genome maps, 
a visual representation of something far more complex. One unusual feature of 
plastid genomes is that the LSC and SSC have alternative orientations relative 
to the IR within a single organelle (Palmer, 1983). This so-called flip-flop 
recombination has also been documented for plastid 


genomes 


the fern 


Osmuncia (Stein et al., 1986). Thus, the relative orientations of the LSC and 
SSC in any map are arbitrary. Furthermore, experiments with native 
chloroplast genomes indicate that, at least in some situations, most molecules 
are linear and some even branched, with few displaying the more familiar 
circular structure depicted in most maps (Oldenburg and Bendich, 2004). 


complete plastid genome sequence from 


the 


We provide here the first 
marattioid clade of plants. Availability of this sequence can enable researchers 
to design conserved primers to PCR-amplify and sequence new genomic 
regions that could provide useful phylogenetic information not available Irom 
the array of regions usually studied in ferns (Small et al., 2005). In addition, 
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the structural details ol the Angioplcris plaslid genome join a growing database 
Irani other green plants. Ultimately such data can be used to infer phvlogeny as 
well as help understand evolutionary process at both the sequence and genome 
structural levels. 
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